Data Processing Workflow¶

In this workflow within PyCCAPT, we can crope the data, do the voltage and bowl calibration, calculate the 3d reconstruction, and do the ranging.

In [1]:
# Activate intractive functionality of matplotlib
%matplotlib ipympl
# Activate auto reload 
%load_ext autoreload
%autoreload 2
%reload_ext autoreload
# import libraries
import os
import numpy as np
from ipywidgets import widgets
from IPython.display import display
from ipywidgets import fixed, interact_manual
import warnings
# Ignore all warnings
warnings.filterwarnings("ignore")

# Local module and scripts
from pyccapt.calibration.calibration_tools import widgets as wd
from pyccapt.calibration.data_tools import data_tools, data_loadcrop, dataset_path_qt
from pyccapt.calibration.tutorials.tutorials_helpers import helper_calibration
from pyccapt.calibration.tutorials.tutorials_helpers import helper_data_loader
from pyccapt.calibration.tutorials.tutorials_helpers import helper_temporal_crop
from pyccapt.calibration.tutorials.tutorials_helpers import helper_special_crop
from pyccapt.calibration.tutorials.tutorials_helpers import helper_t_0_tune
from pyccapt.calibration.tutorials.tutorials_helpers import helper_mc_plot
from pyccapt.calibration.tutorials.tutorials_helpers import helper_3d_reconstruction
from pyccapt.calibration.tutorials.tutorials_helpers import helper_ion_selection
from pyccapt.calibration.tutorials.tutorials_helpers import helper_visualization
from pyccapt.calibration.tutorials.tutorials_helpers import helper_ion_list
In case of recieving the error about pytable library, you have to install the pytables library with conda command. to do that you can open a new cell and copy the line below in it. Then just run it like other cells. The pytables library will be innstalled.

!conda install --yes --prefix {sys.prefix} pytables

By clicking on the button below, you can select the dataset file you want to crop. The dataset file can be in various formats, including HDF5, EPOS, POS, ATO, and CSV. The cropped data will be saved in the same directory as the original dataset file in a new directory nammed load_crop. The name of the cropped dataset file will be the same as the original dataset file. The figures will be saved in the same directory as the dataset file.

In [2]:
button = widgets.Button(
    description='load dataset',
)
@button.on_click
def open_file_on_click(b):
    """
    Event handler for button click event.
    Prompts the user to select a dataset file and stores the selected file path in the global variable dataset_path.
    """
    global dataset_path
    dataset_path = dataset_path_qt.gui_fname().decode('ASCII')
button
Out[2]:

ROI Selection and Data Cropping¶

From the dropdown lists below, you can select the instrument specifications of the dataset. The instrument specifications are the same as the ones used for the calibration process. Data mode is specify the dataset structure. The dataset can be in raw or calibrated mode. The flight path length is the distance between the sample and the detector. The t0 is the time of flight of the ions with the lowest mass-to-charge ratio. The maximum mass-to-charge ratio is the maximum mass-to-charge ratio of tat you want to plot. You can also change it in te related cells. The detector diameter is the diameter of the detector.

In [3]:
# create an object for selection of instrument specifications of the dataset
tdc, pulse_mode, flight_path_length, t0, max_mc, det_diam = wd.dataset_instrument_specification_selection()

# Display lists and comboboxes to selected instrument specifications
display(tdc, pulse_mode, flight_path_length, t0, max_mc)
In [6]:
variables = helper_data_loader.load_data(dataset_path, max_mc.value, flight_path_length.value, pulse_mode.value, tdc.value)
display(variables.data)
display(variables.range_data)
The maximum possible TOF is: 5010 ns
=============================
The data will be saved on the path: D:/pyccapt/tests/data/data_1642_Aug-30-2023_16-05_Al_test4/data_processing/
=============================
The dataset name after saving is: data_1642_Aug-30-2023_16-05_Al_test4
=============================
The figures will be saved on the path: D:/pyccapt/tests/data/data_1642_Aug-30-2023_16-05_Al_test4/data_processing/
=============================
{'apt': ['high_voltage', 'main_chamber_vacuum', 'num_events', 'pulse', 'temperature', 'time_counter'], 'dld': ['high_voltage', 'pulse', 'start_counter', 't', 'x', 'y'], 'tdc': ['channel', 'high_voltage', 'pulse', 'start_counter', 'time_data'], 'time': ['time_h', 'time_m', 'time_s']}
The number of data over max_tof: 245
Total number of Ions: 12312751
high_voltage (V) pulse start_counter t (ns) x_det (cm) y_det (cm)
0 600.000000 328.0 8202 2537.802979 1.080816 0.006531
1 615.000000 328.0 14741 3686.929443 1.443265 -1.812245
2 624.979980 328.0 2657 3110.466553 -0.688980 -2.249796
3 624.979980 328.0 4568 1171.380737 0.192653 -0.914286
4 634.919983 328.0 4498 2703.307129 0.058776 1.479184
... ... ... ... ... ... ...
12312746 8000.000000 1600.0 11089 3722.090332 2.282449 2.798367
12312747 8000.000000 1600.0 13935 3065.292725 3.725714 -0.675918
12312748 8000.000000 1600.0 2722 2561.627686 3.229388 1.573878
12312749 8000.000000 1600.0 3387 3579.656494 0.414694 2.693877
12312750 8000.000000 1600.0 14288 2206.904297 1.244082 -2.847347

12312751 rows × 6 columns

ion mass mc mc_low mc_up color element complex isotope charge
0 unranged 0.0 0.0 0.0 400.0 #000000 unranged 0 0 0
In [ ]:
#load data, if it exists,
try:
    if os.path.exists(variables.result_data_path + '//' + variables.result_data_name + '.h5'):
        variables.data = data_tools.load_data(variables.result_data_path + '//' + variables.result_data_name + '.h5', tdc='pyccapt', mode='processed')
        # exctract needed data from Pandas data frame as an numpy array
        data_tools.extract_data(variables.data, variables, flight_path_length.value, max_mc.value)
        print('Continue from the point based on the loaded data')
    else:
        print('No data avaliable')
    if os.path.exists(variables.result_data_path + '/' + 'range_' + variables.dataset_name+ '.h5'):
        variables.range_data = data_tools.read_hdf5_through_pandas(variables.result_data_path + '/' + 'range_' + variables.dataset_name+ '.h5')
        # exctract needed data from Pandas data frame as an numpy array
        data_tools.extract_data(variables.data, variables, flight_path_length.value, max_mc.value)
        print('Continue from the point based on the loaded data')
    else:
        print('No range data avaliable')
    # exctract needed data from Pandas data frame as a numpy array
    data_tools.extract_data(variables.data, variables, flight_path_length.value, max_mc.value)
    display(variables.data)
    display(variables.range_data)
except Exception as e:
    pass
    # print(e)

Temporal crop¶

Select the data by drawing a rectangle over the experiment history. Experiment history is a 2D histogram of the time of flight of the ions versus sequence of evaporation. The experiment history is plotted by clicking on the button below te cell.

In [7]:
helper_temporal_crop.call_plot_crop_experiment(variables, pulse_mode.value)

Spacial crop¶

Select the region of maximum concentration of Ions in the below plotted graph to utilize relevant data. To crop you can draw a circle over the filed desorption map. The field desorption map is a 2D histogram of the time of flight of the ions versus the position of the ions on the detector. The field desorption map is plotted by clicking on the button below the cell.

In [8]:
helper_special_crop.call_plot_crop_fdm(variables)

Calculate pulses since the last event pulse and ions per pulse. The percentage of loss in ROI selection process will be printed.

In [9]:
pulse_pi, ion_pp = data_loadcrop.calculate_ppi_and_ipp(variables.data)

# add two calculated array to the croped dataset
variables.data['pulse_pi'] = pulse_pi.astype(np.uintc)
variables.data['ion_pp'] = ion_pp.astype(np.uintc)

# exctract needed data from Pandas data frame as an numpy array
variables.dld_high_voltage = variables.data['high_voltage (V)'].to_numpy()
variables.dld_pulse = variables.data['pulse'].to_numpy()
variables.dld_t = variables.data['t (ns)'].to_numpy()
variables.dld_x_det = variables.data['x_det (cm)'].to_numpy()
variables.dld_y_det = variables.data['y_det (cm)'].to_numpy()

# save the cropped data
print('tof Crop Loss {:.2f} %'.format((100 - (len(variables.data) / len(variables.data)) * 100)))
#percentage of double event per pulse
print('percentage of double event per pulse', len(ion_pp[ion_pp != 1]) / float(len(ion_pp)))
tof Crop Loss 0.00 %
percentage of double event per pulse 0.018139624455383564

In the next cell by changing the t0 value you can correct the position of H1 or any other known peak. this correction would be helpful for the position of the peaks in the m/c calibration process.

In [10]:
helper_t_0_tune.call_fine_tune_t_0(variables, flight_path_length, pulse_mode, t0)

Remove the data with m/c greater than max m/c and x, y, t = 0. Also add the needed colums for calibration. The data types of the final cropped dataset is displayed below.

In [11]:
# add the columns in the dataset for x, y, z, mc, mc calibrated, and t calibrated
helper_data_loader.add_columns(variables, max_mc)
# save data temporarily
data_tools.save_data(variables.data, variables, hdf=True, epos=False, pos=False, ato_6v=False, csv=False)
display(variables.data)
display(variables.data.dtypes)
The number of data over max_mc: 695846
The number of data with having t, x, and y equal to zero is: 0
x (nm) y (nm) z (nm) mc_c (Da) mc (Da) high_voltage (V) pulse start_counter t_c (ns) t (ns) x_det (cm) y_det (cm) pulse_pi ion_pp
0 0.0 0.0 0.0 0.0 28.724681 5013.479980 1002.695984 4162 0.0 616.458740 2.651428 0.666122 0 0
1 0.0 0.0 0.0 0.0 29.664233 5013.479980 1002.695984 4221 0.0 627.266968 -2.847347 -0.150204 59 2
2 0.0 0.0 0.0 0.0 28.994070 5013.479980 1002.695984 4414 0.0 605.204773 1.142857 0.189388 193 1
3 0.0 0.0 0.0 0.0 28.934313 5013.479980 1002.695984 4438 0.0 606.110046 1.394286 0.212245 24 1
4 0.0 0.0 0.0 0.0 29.575221 5013.479980 1002.695984 4728 0.0 619.249939 -1.325714 -1.799184 290 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
10328192 0.0 0.0 0.0 0.0 30.336661 6353.959961 1270.791992 8823 0.0 560.408325 1.707755 1.280000 46 1
10328193 0.0 0.0 0.0 0.0 29.042071 6353.959961 1270.791992 8970 0.0 543.112427 1.028571 -0.698776 147 1
10328194 0.0 0.0 0.0 0.0 30.381020 6353.959961 1270.791992 9039 0.0 564.934631 -1.648980 1.962449 69 1
10328195 0.0 0.0 0.0 0.0 14.419212 6353.959961 1270.791992 9654 0.0 398.833862 1.377959 -1.502041 346 1
10328196 0.0 0.0 0.0 0.0 29.078471 6353.959961 1270.791992 9901 0.0 547.645569 -1.162449 -1.505306 247 1

10328197 rows × 14 columns

x (nm)              float64
y (nm)              float64
z (nm)              float64
mc_c (Da)           float64
mc (Da)             float64
high_voltage (V)    float64
pulse               float64
start_counter        uint32
t_c (ns)            float64
t (ns)              float64
x_det (cm)          float64
y_det (cm)          float64
pulse_pi             uint32
ion_pp               uint32
dtype: object


Time-of-Flight Calibration¶

Below you can plot the ToF histogram of the dataset. You can select the peak range of the data you want to plot by drawing a rectangle over peak with holding left click. After that you should apply the voltage and bowl calibration. These three steps should be repeated until you see no improvement in the peak resolution. You can also save the calibration by clicking on the save button. The saved calibration will be used for the next steps.

In [12]:
# exctract needed data from Pandas data frame as an numpy array
data_tools.extract_data(variables.data, variables, flight_path_length.value, max_mc.value)
calibration_mode = widgets.Dropdown(
    options=[('time_of_flight', 'tof'), ('mass_to_charge', 'mc')],
    description='calibration mode:')
display(calibration_mode)
The maximum time of flight: 5010
In [13]:
helper_calibration.call_voltage_bowl_calibration(variables, det_diam, calibration_mode)
In [14]:
variables.dld_t_calib_backup = np.copy(variables.dld_t_calib)
variables.mc_calib_backup = np.copy(variables.mc_calib)
In [15]:
helper_ion_list.call_ion_list(variables, selector='peak', calibration_mode=calibration_mode)
In [16]:
helper_mc_plot.call_mc_plot(variables, selector='None')
In [17]:
variables.data['mc_c (Da)'] = variables.mc_calib
variables.data['t_c (ns)'] = variables.dld_t_calib
# Remove negative mc
threshold = 0
mc_t = variables.data['mc_c (Da)'].to_numpy()
mc_t_mask = (mc_t <= threshold)
print('The number of ions with negative mc are:', len(mc_t_mask[mc_t_mask==True]))
variables.data.drop(np.where(mc_t_mask)[0], inplace=True)
variables.data.reset_index(inplace=True, drop=True)
# save data temporarily
data_tools.save_data(variables.data, variables, hdf=True, epos=False, pos=False, ato_6v=False, csv=False)
The number of ions with negative mc are: 11427


3D Reconstruction¶

After bowl and voltage calibration we are ready to calculate the 3d reconstruction. In this workflow we calculate the reconstructed x,y,z and then plot the 3d, heatmap, projection plots and mass-to-charge histogram.

In [18]:
# exctract needed data from Pandas data frame as an numpy array
data_tools.extract_data(variables.data, variables, flight_path_length.value, max_mc.value)

display(variables.data)
The maximum time of flight: 5010
x (nm) y (nm) z (nm) mc_c (Da) mc (Da) high_voltage (V) pulse start_counter t_c (ns) t (ns) x_det (cm) y_det (cm) pulse_pi ion_pp
0 0.0 0.0 0.0 27.092448 28.724681 5013.479980 1002.695984 4162 571.235437 616.458740 2.651428 0.666122 0 0
1 0.0 0.0 0.0 26.948118 29.664233 5013.479980 1002.695984 4221 569.811044 627.266968 -2.847347 -0.150204 59 2
2 0.0 0.0 0.0 26.994150 28.994070 5013.479980 1002.695984 4414 570.265747 605.204773 1.142857 0.189388 193 1
3 0.0 0.0 0.0 26.998421 28.934313 5013.479980 1002.695984 4438 570.307919 606.110046 1.394286 0.212245 24 1
4 0.0 0.0 0.0 27.333027 29.575221 5013.479980 1002.695984 4728 573.601335 619.249939 -1.325714 -1.799184 290 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
10316765 0.0 0.0 0.0 28.297223 30.336661 6353.959961 1270.791992 8823 582.980918 560.408325 1.707755 1.280000 46 1
10316766 0.0 0.0 0.0 27.105522 29.042071 6353.959961 1270.791992 8970 571.364274 543.112427 1.028571 -0.698776 147 1
10316767 0.0 0.0 0.0 27.549218 30.381020 6353.959961 1270.791992 9039 575.718565 564.934631 -1.648980 1.962449 69 1
10316768 0.0 0.0 0.0 13.681184 14.419212 6353.959961 1270.791992 9654 416.828667 398.833862 1.377959 -1.502041 346 1
10316769 0.0 0.0 0.0 26.842914 29.078471 6353.959961 1270.791992 9901 568.770402 547.645569 -1.162449 -1.505306 247 1

10316770 rows × 14 columns

You have to select the main element in your sample from the from dropdown below.

In [19]:
element_selected = wd.density_field_selection()
display(element_selected)

In case that yopu face error about plotly library, like javascripts error. You have check your jupyter lab version is compatibale with the plotly extenstion. Sometimes running jupyter lab build command fix the proble.

In [20]:
helper_3d_reconstruction.call_x_y_z_calculation(variables, flight_path_length, element_selected)
In [21]:
variables.plotly_3d_reconstruction
Out[21]:
In [22]:
variables.data['x (nm)'] = variables.x
variables.data['y (nm)'] = variables.y
variables.data['z (nm)'] = variables.z
# save data temporarily
data_tools.save_data(variables.data, variables, hdf=True, epos=False, pos=False, ato_6v=False, csv=False)


Ion Selection and Rangging¶

This tutorial outlines a comprehensive workflow for ion selection and organization. Users can choose ions using peak and element finders, manually add ions, customize ion colors, and create histograms. Histograms can be generated for selected ranges and areas, and figures can be saved. Additionally, users have the option to save both figures and data in CSV and HDF5 formats.

In [23]:
# exctract needed data from Pandas data frame as an numpy array
data_tools.extract_data(variables.data, variables, flight_path_length.value, max_mc.value)

display(variables.data)
The maximum time of flight: 5010
x (nm) y (nm) z (nm) mc_c (Da) mc (Da) high_voltage (V) pulse start_counter t_c (ns) t (ns) x_det (cm) y_det (cm) pulse_pi ion_pp
0 21.349402 5.363643 3.781171 27.092448 28.724681 5013.479980 1002.695984 4162 571.235437 616.458740 2.651428 0.666122 0 0
1 -22.845939 -1.205176 4.094125 26.948118 29.664233 5013.479980 1002.695984 4221 569.811044 627.266968 -2.847347 -0.150204 59 2
2 9.521428 1.577837 0.709855 26.994150 28.994070 5013.479980 1002.695984 4414 570.265747 605.204773 1.142857 0.189388 193 1
3 11.573396 1.761758 1.047097 26.998421 28.934313 5013.479980 1002.695984 4438 570.307919 606.110046 1.394286 0.212245 24 1
4 -10.820421 -14.684858 2.572101 27.333027 29.575221 5013.479980 1002.695984 4728 573.601335 619.249939 -1.325714 -1.799184 290 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
10316765 17.708551 13.272948 57.991811 28.297223 30.336661 6353.959961 1270.791992 8823 582.980918 560.408325 1.707755 1.280000 46 1
10316766 10.847843 -7.369646 56.044563 27.105522 29.042071 6353.959961 1270.791992 8970 571.364274 543.112427 1.028571 -0.698776 147 1
10316767 -16.910530 20.125206 59.249583 27.549218 30.381020 6353.959961 1270.791992 9039 575.718565 564.934631 -1.648980 1.962449 69 1
10316768 14.320546 -15.610074 57.737882 13.681184 14.419212 6353.959961 1270.791992 9654 416.828667 398.833862 1.377959 -1.502041 346 1
10316769 -12.117165 -15.691047 57.394154 26.842914 29.078471 6353.959961 1270.791992 9901 568.770402 547.645569 -1.162449 -1.505306 247 1

10316770 rows × 14 columns

In [24]:
helper_ion_selection.call_ion_selection(variables)
In [25]:
variables.range_data
Out[25]:
ion mass mc mc_low mc_up color element complex isotope charge
0 ${}^{1}H^{+}$ 1.01 1.002494 0.839158 1.170298 #b2aa2d [H] [1] [1] 1
1 ${}^{27}Al^{2+}$ 13.49 13.433341 13.233211 14.251161 #e7e0d1 [Al] [1] [27] 2
2 ${}^{27}Al^{+}$ 26.98 26.966924 26.325274 28.606463 #e7e0d1 [Al] [1] [27] 1
In [26]:
variables.range_data.dtypes
Out[26]:
ion         object
mass       float64
mc         float64
mc_low     float64
mc_up      float64
color       object
element     object
complex     object
isotope     object
charge      uint32
dtype: object

Save the range in the hdf5 and csv format.

In [27]:
# save the new data
name_save_file = variables.result_data_path + '/' + 'range_' + variables.dataset_name + '.h5'
data_tools.store_df_to_hdf(variables.range_data,  'df', name_save_file)
# save data in csv format
name_save_file = variables.result_data_path + '/' + 'range_' + variables.dataset_name + '.csv'
data_tools.store_df_to_csv(variables.range_data, name_save_file)

Save the cropped dataset. You can specify te output format from list below. The output formats are HDF5, EPOS, POS, ATO, and CSV. The output file will be saved in the same directory as the original dataset file in a new directory nammed load_crop.

In [28]:
interact_manual(data_tools.save_data, data=fixed(variables.data), variables=fixed(variables),
                hdf=widgets.Dropdown(options=[('True', True), ('False', False)]),
                epos=widgets.Dropdown(options=[('False', False), ('True', True)]),
                pos=widgets.Dropdown(options=[('False', False), ('True', True)]),
                ato_6v=widgets.Dropdown(options=[('False', False), ('True', True)]),
                csv=widgets.Dropdown(options=[('False', False), ('True', True)]));


Visualization¶

In [29]:
variables.data
Out[29]:
x (nm) y (nm) z (nm) mc_c (Da) mc (Da) high_voltage (V) pulse start_counter t_c (ns) t (ns) x_det (cm) y_det (cm) pulse_pi ion_pp
0 21.349402 5.363643 3.781171 27.092448 28.724681 5013.479980 1002.695984 4162 571.235437 616.458740 2.651428 0.666122 0 0
1 -22.845939 -1.205176 4.094125 26.948118 29.664233 5013.479980 1002.695984 4221 569.811044 627.266968 -2.847347 -0.150204 59 2
2 9.521428 1.577837 0.709855 26.994150 28.994070 5013.479980 1002.695984 4414 570.265747 605.204773 1.142857 0.189388 193 1
3 11.573396 1.761758 1.047097 26.998421 28.934313 5013.479980 1002.695984 4438 570.307919 606.110046 1.394286 0.212245 24 1
4 -10.820421 -14.684858 2.572101 27.333027 29.575221 5013.479980 1002.695984 4728 573.601335 619.249939 -1.325714 -1.799184 290 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
10316765 17.708551 13.272948 57.991811 28.297223 30.336661 6353.959961 1270.791992 8823 582.980918 560.408325 1.707755 1.280000 46 1
10316766 10.847843 -7.369646 56.044563 27.105522 29.042071 6353.959961 1270.791992 8970 571.364274 543.112427 1.028571 -0.698776 147 1
10316767 -16.910530 20.125206 59.249583 27.549218 30.381020 6353.959961 1270.791992 9039 575.718565 564.934631 -1.648980 1.962449 69 1
10316768 14.320546 -15.610074 57.737882 13.681184 14.419212 6353.959961 1270.791992 9654 416.828667 398.833862 1.377959 -1.502041 346 1
10316769 -12.117165 -15.691047 57.394154 26.842914 29.078471 6353.959961 1270.791992 9901 568.770402 547.645569 -1.162449 -1.505306 247 1

10316770 rows × 14 columns

In [30]:
helper_visualization.call_visualization(variables)
In [31]:
variables.plotly_3d_reconstruction
Out[31]: